Life tables, Lee-Carter Modeling

Life tables
Lee Carter
Motalit quotients
SVD
Published

September 5, 2024

Objectives

Data sources

Life data tables are downloaded from https://www.mortality.org.

See also https://www.lifetable.de.

If you install and load package https://cran.r-project.org/web/packages/demography/index.html, you will also find life data tables.

We investigate life tables describing countries from Western Europe (France, Great Britain –actually England and Wales–, Italy, the Netherlands, Spain, and Sweden) and the United States.

We load the one-year lifetables for female, male and whole population for the different countries.

Code
life_table |>
  dplyr::mutate(Country = forcats::as_factor(Country)) |>
  dplyr::mutate(Country = forcats::fct_relevel(Country, "Spain", "Italy", "France", "England & Wales", "Netherlands", "Sweden", "USA")) |>
  dplyr::mutate(Gender = forcats::as_factor(Gender)) -> life_table

life_table |>
  dplyr::mutate(Area =forcats::fct_collapse(Country, 
                                            SE = c("Spain", "Italy", "France"), 
                                            NE = c("England & Wales", "Netherlands", "Sweden"), 
                                            USA="USA")) -> life_table

Check on http://www.mortality.org the meaning of the different columns:

Document Tables de mortalité françaises pour les XIXe et XXe siècles et projections pour le XXIe siècle contains detailed information on the construction of Life Tables for France.

Two kinds of Life Tables can be distinguished: Table du moment which contain for each calendar year, the mortality risks at different ages for that very year; and Tables de génération which contain for a given birthyear, the mortality risks at which an individual born during that year has been exposed.

The life tables investigated in this homework are Table du moment. According to the document by Vallin and Meslé, building the life tables required ,decisions and doctoring.

See (among other things)

  • p. 19 Abrupt changes in mortality quotients at some ages for a given calendar year
  • Estimating mortality quotients at great age.

Have a look at Lexis diagram.

Definitions can be obtained from www.lifeexpectancy.org. We translate it into mathematical (rather than demographic) language. Recall that the quantities define a probability distribution over \(\mathbb{N}\). This probability distribution is a construction that reflects the health situation in a population at a given time. This probability distribution does not describe the sequence of sanitary situations experienced by a cohort (people born during a specific year).

One works with a period, or current, life table (table du moment). This summarizes the mortality experience of persons across all ages in a short period, typically one year or three years. More precisely, the death probabilities \(q(x)\) for every age \(x\) are computed for that short period, often using census information gathered at regular intervals. These \(q(x)\)’s are then applied to a hypothetical cohort of \(100 000\) people over their life span to produce a life table.

Code
life_table |> 
  filter(Country=='France', Year== 2010, Gender=='Female', Age < 10 | Age > 80)
# A tibble: 39 × 13
    Year   Age      mx      qx    ax     lx    dx    Lx      Tx    ex Country
   <int> <int>   <dbl>   <dbl> <dbl>  <int> <int> <int>   <int> <dbl> <fct>  
 1  2010     0 0.00325 0.00324  0.14 100000   324 99722 8465207  84.6 France 
 2  2010     1 0.00032 0.00032  0.5   99676    32 99660 8365484  83.9 France 
 3  2010     2 0.00015 0.00015  0.5   99645    15 99637 8265824  83.0 France 
 4  2010     3 0.00011 0.00011  0.5   99630    11 99624 8166187  82.0 France 
 5  2010     4 0.00008 0.00008  0.5   99619     8 99615 8066563  81.0 France 
 6  2010     5 0.00005 0.00005  0.5   99611     5 99608 7966948  80.0 France 
 7  2010     6 0.00008 0.00008  0.5   99606     8 99602 7867339  79.0 France 
 8  2010     7 0.00008 0.00008  0.5   99598     8 99594 7767737  78.0 France 
 9  2010     8 0.00008 0.00008  0.5   99590     8 99586 7668143  77   France 
10  2010     9 0.00007 0.00007  0.5   99582     7 99578 7568557  76   France 
# ℹ 29 more rows
# ℹ 2 more variables: Gender <fct>, Area <fct>

In the sequel, we denote by \(F_{t}\) the cumulative distribution function for year \(t\). We agree on \(\overline{F}_t = 1 - F_t\) and \(F_t(-1)=0\).

Code
life_table |> 
  filter( Year>=1948) |> 
  group_by(Country, Year, Gender) |> 
  summarise(m1 =max(abs(lx -dx -lead(lx)), na.rm = T), 
            m2 =max(abs(lx * qx -dx), na.rm=T),
            m3 =max(abs(Lx -lx * (1 + qx * (ax-1))), na.rm=T),
            m4 =max(abs(1-exp(-mx)-qx), na.rm=T)) |> 
  select(Year, Country, Gender, m1, m2, m3, m4) |>  
  ungroup() |> 
  group_by(Country, Gender) |> 
  slice_max(order_by = desc(m4), n = 1)
# A tibble: 21 × 7
# Groups:   Country, Gender [21]
    Year Country         Gender    m1    m2    m3      m4
   <int> <fct>           <fct>  <int> <dbl> <dbl>   <dbl>
 1  1948 Spain           Both       1 0.874 2.20  0.00838
 2  1948 Spain           Female     1 0.789 1.56  0.00816
 3  1952 Spain           Male       1 0.802 5.5   0.0119 
 4  2004 Italy           Both       1 0.836 0.968 0.0150 
 5  2004 Italy           Female     1 0.875 1.03  0.0149 
 6  1984 Italy           Male       1 0.774 5.56  0.0146 
 7  2007 France          Both       1 0.887 0.976 0.0152 
 8  2007 France          Female     1 0.890 0.980 0.0151 
 9  1979 France          Male       1 0.764 4.97  0.0161 
10  1992 England & Wales Both       1 0.898 2.42  0.0135 
# ℹ 11 more rows
qx
(age-specific) risk of death at age \(x\), or mortality quotient at given age \(x\) for given year \(t\): \(q_{t,x} = \frac{\overline{F}_t(x) - \overline{F}_t(x+1)}{\overline{F}_t(x)}\).
For each year, each age, \(q_{t,x}\) is determined by data. We also have \[\overline{F}_{t}(x+1) = \overline{F}_{t}(x) \times (1-q_{t,x+1})\, .\]
mx
central death rate at age \(x\) during year \(t\). This is connected with \(q_{t,x}\) by \[m_{t,x} = -\log(1- q_{t,x}) \,,\]

or equivalently \(q_{t,x} = 1 - \exp(-m_{t,x})\).

lx
the so-called survival function: the scaled proportion of persons alive at age \(x\). These values are computed recursively from the \(q_{t,x}\) values using the formula \[l_t(x+1) = l_t(x) \times (1-q_{t,x}) \, ,\] with \(l_{t,0}\), the “radix” of the table, arbitrarily set to \(100000\). Function \(l_{t,\cdot}\) and \(\overline{F}_t\) are connected by \[l_{t,x + 1} = l_{t,0} \times \overline{F}_t(x)\,.\] Note that in Probability theory, \(\overline{F}\) is also called the survival or tail function.
dx
\(d_{t,x} = q_{t,x} \times l_{t,x}\)
Tx
Total number of person-years lived by the cohort from age \(x\) to \(x+1\). This is the sum of the years lived by the \(l_{t, x+1}\) persons who survive the interval, and the \(d_{t,x}\) persons who die during the interval. The former contribute exactly \(1\) year each, while the latter contribute, on average, approximately half a year, so that \(L_{t,x} = l_{t,x+1} + 0.5 \times d_{t,x}\). This approximation assumes that deaths occur, on average, half way in the age interval x to x+1. Such is satisfactory except at age 0 and the oldest age, where other approximations are often used; We will stick to a simplified vision \(L_{t,x}= l_{t,x+1}\)
ex:
Residual Life Expectancy at age \(x\) and year \(t\)

Loading life_table onto an in memory database

We load life_table into an in memory database, unleashing the full power of SQL. This is helpful if we have to use window functions.

<SQL>
SELECT `dbplyr_ep5AcBVNWu`.*
FROM `dbplyr_ep5AcBVNWu`
WHERE (`Gender` = 'Female') AND (`Country` = 'USA') AND ('Year' = 1948.0)

Object lt can be queried like any other data frame.

Code
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
src <- dbplyr::src_dbi(con, auto_disconnect = TRUE)

dplyr::copy_to(src, lt)

Computing residual life expectancies at all ages can also be completed using SQL queries.

Western countries in 1948

Several pictures share a common canvas: we plot central death rates against ages using a logarithmic scale on the \(y\) axis. Countries are identified by aesthetics (shape, color, linetypes). Abiding to the DRY principle, we define a prototype ggplot (alternatively plotly) object. The prototype will be fed with different datasets and decorated and arranged for the different figures.

Code
dummy_data <- dplyr::filter(life_table, FALSE)

proto_plot <- ggplot(dummy_data,
                     aes(x=Age,
                         y=qx,
                         col=Area,
                         linetype=Country,
                         shape=Country)) +
              scale_y_log10() +
              scale_x_continuous(breaks = c(seq(0, 100, 10), 109)) +
              ylab("Mortality quotients") +
              labs(linetype="Country") +
              theme_bw()
  • Plot qx of all Countries at all ages for years 1948 and 2013.
Code
proto_plt2 <-
  ggplot() +
  aes(x=Age, y=qx, colour=Area, frame=Year, linetype=Country) +
  geom_point(size=.1) +
  geom_line(size=.1) +
  scale_y_log10() +
  labs(linetype=c("Country")) +
  scale_x_continuous(breaks = c(seq(0, 100, 10), 109)) +
  xlab("Age") +
  ylab("Central death rates") +
  facet_grid(cols=vars(Gender))

with(params,
  (proto_plt2 %+%
    (life_table |> 
      filter(between(Year, year_p, year_e), 
             Gender != 'Both', 
             Age < 90))  +
    ggtitle("Central death rates 1948-2013: Europe catches up"))) |>
  plotly::ggplotly()

The animated plot allows to spot more details. It is useful to use color so as to distinguish threee areas: USA; Northern Europe (NE) comprising England and Wales, the Netherlands, and Sweden; Southern Europe (SE) comprising Spain, Italy, and France. In 1948, NE and the USA exhibit comparable central death reates at all ages for the two genders, the USA looking like a more dangerous place for young adults. Spain lags behind, Italy and Frane showing up at intermediate positions.

By year 1962, SE has almost caught up the USA. Italy and Spain still have higher infant mortality while central death rates in the USA and France are almost identical at all ages for both genders. Central death rates attain a minimum around 10-12 for both genders. In Spain the minium central death rate has been divided by almost ten between 1948 and 1962.

If we dig further we observe that the shape of the male central death rates curve changes over time. In 1962, in the USA and France, central death rates exhibit a sharp increase between years 12 and 18, then remain almost constant between 20 and 30 and afterwards increase again. This pattern shows up in other countries but in a less spectacular way.

Twenty years afterwards, during years 1980-1985, death rates at age 0 have decreased at around \(1\%\) in all countries while it was \(7\%\) in Spain in 1948. The male central death curve exhibits a plateau between ages 20 and 30. Central death rates at this age look higher in France and the USA.

By year 2000, France is back amongst European countries (at least with respect to central death rates). Young adult mortality rates are higher in the USA than in Europe. This phenomenon became more pregnant during the last decade.

Plot ratios between central death rates (qx) in European countries and central death rates in the USA in 1948.

Code
with(params,
(eur_us_table  |>
  ggplot(aes(x=Age,
             y=Ratio,
             col=Area,
             frame=Year,
             linetype=Country)) +
  scale_y_log10() +
  scale_x_continuous(breaks = c(seq(0, 100, 10), 109)) +
  geom_point(size=.1) +
  geom_smooth(method="loess", se=FALSE, span=.1, size=.1) +
  ylab("Ratio of mortality quotients with respect to US") +
  labs(linetype="Country", color="Area") +
  # scale_colour_brewer(direction=-1) +
  ggtitle(label = stringr::str_c("European countries with respect to US,", year_p,'-', year_e, sep = " "), subtitle = "Sweden consistently ahead") +
  facet_grid(rows = vars(Gender)))) |>
  ggplotly()